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Abstract 

The  large  deviations  principle  for  the  empirical  measure  for  both  continuous 
and  discrete  time  Markov  processes  is  well  known.  Various  expressions  are  avail¬ 
able  for  the  rate  function,  but  these  expressions  are  usually  as  the  solution  to  a 
variational  problem,  and  in  this  sense  not  explicit.  An  interesting  class  of  con¬ 
tinuous  time,  reversible  processes  was  identified  in  the  original  work  of  Donsker 
and  Varadhan  for  which  an  explicit  expression  is  possible.  While  this  class  in¬ 
cludes  many  (reversible)  processes  of  interest,  it  excludes  the  case  of  continuous 
time  pure  jump  processes,  such  as  a  reversible  finite  state  Markov  chain.  In 
this  paper  we  study  the  large  deviations  principle  for  the  empirical  measure  of 
pure  jump  Markov  processes  and  provide  an  explicit  formula  of  the  rate  function 
under  reversibility. 

1  Introduction 

Let  X  (t)  be  a  time  homogeneous  Markov  process  with  Polish  state  space  S,  and  let 
P  (t,  x,  dy )  be  the  transition  function  of  X  (t).  For  t  6  [0,  oo),  define  Tt  by 

Ttf{x)  =  f  f  (y)  P(t,x,dy) . 

Js 

Then  Tt  is  a  contraction  semigroup  on  the  Banach  space  of  bounded,  Borel  measurable 
functions  on  S  [6,  Chapter  4.1].  We  use  C  to  denote  the  infinitesimal  generator  of 
Tt  and  V  the  domain  of  C  (see  [6,  Chapter  1]).  Hence  for  each  bounded  measurable 
function  /  G  V, 

Cf  (x)  =  lint  j  [  f(y)P  ( t ,  x,  dy)  -  f  (x)  . 
tl°  t  IJs 

‘The  authors  thank  a  referee  for  useful  comments  and  a  correction. 
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The  empirical  measure  (or  normalized  occupation  measure)  up  to  time  T  of  the 
Markov  process  X  (t)  is  defined  by 

V T^  =  fJ0  6xW^dt-  (L1) 

Let  V  ( S )  be  the  metric  space  of  probability  measures  on  S  equipped  with  the  Levy- 
Prohorov  metric,  which  is  compatible  with  the  topology  of  weak  convergence.  For 
77  €.  V  ( S ),  define 

r  rv 

7(7?)  =  -inf  /  —  dy.  (1.2) 

uevJs  u 

u>  0 

It  is  easy  to  check  that  7  thus  defined  is  lower  semicontinuous  under  the  topology  of 
weak  convergence.  Consider  the  following  regularity  assumption. 

Condition  1.1  There  exists  a  probability  measure  A  on  S  such  that  for  t  >  0  the 
transition  functions  P(t,x,dy)  have  densities  with  respect  to  A,  i.e., 

P  (t,  x,  dy)  =  p  (t,  x,  y)  A  {dy) .  (1.3) 

Under  additional  recurrence  and  transitivity  conditions,  Donsker  and  Varadhan 
[2,  3]  prove  the  following.  For  any  open  set  O  C  V  ( S ) 

liminf  —  logT’  (ijt  (•)  G  O)  >  —  inf  7  (77) ,  (1.4) 

T— >00  T  r/GO 

and  for  any  closed  set  C  C  V  ( S ) 

lirnsup  ^  log  P  (tjt  (•)  GC)<-  inf  7  (77) .  (1.5) 

T— >00  T  n&c 

We  refer  to  (1.4)  as  the  large  deviation  lower  bound  and  (1.5)  as  the  large  de¬ 
viation  upper  bound.  Under  ergodicity,  the  empirical  measure  r/x  converges  to  the 
invariant  distribution  of  the  Markov  process  X  (t).  The  large  deviation  principle 
characterizes  this  convergence  through  the  associated  rate  function.  While  there  are 
many  situations  where  an  explicit  formula  for  (1.2)  would  be  useful,  it  is  in  general 
difficult  to  solve  the  variational  problem.  The  main  existing  results  on  this  issue  are 
for  the  self-adjoint  case  in  the  continuous  time  setting,  see  [2,  9,  11].  Specifically, 
suppose  there  is  a  cr-finite  measure  (p  on  S,  and  that  the  densities  in  (1.3)  satisfy  the 
following  reversibility  condition: 

p  (t,  x,y)  =  p  (■ t ,  y,  x)  almost  everywhere  {ip  x  ip) .  (1.6) 

Then  Tt  is  self-adjoint.  If  we  denote  the  closure  of  £  by  £  (see,  e.g.,  [6,  p.  16])  and 
the  domain  of  £  by  T>  (£),  then  £  is  self-adjoint  and  negative  semidefinite  (since  7~) 

is  a  contraction).  We  denote  by  (— £)  '  the  canonical  positive  semidefinite  square 
root  of  — £  [10,  Chapter  12].  Let  £>1/2  be  the  domain  of  (— £)172.  Donsker  and 
Varadhan  [2,  Theorem  5]  show  under  certain  conditions  that  7  defined  by  (1.2)  has 
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the  following  properties:  I  (/j)  <  oo  if  and  only  if  y  <C  <p  and  (dy/dip)1^2  G  X>i/2,  and 
with  /  =  d /I /dip  and  g  =  f 1/2, 

/(m)  =  |(-£)1/29||2,  (1.7) 

where  ||-||  denotes  the  L2  norm  with  respect  to  tp.  Typically,  <p  is  taken  to  be  the 
invariant  distribution  of  the  process. 

It  should  be  noted  that  this  explicit  formula  does  not  apply  to  one  of  the  simplest 
Markov  processes,  namely,  continuous  time  Markov  jump  processes  with  bounded 
infinitesimal  generators.  Let  B  (S)  be  the  Borel  cr-algebra  on  S  and  let  a(x,  T)  be 
a  transition  kernel  on  S  x  B(S).  Let  B  (S)  denote  the  space  of  bounded  Borel 
measurable  functions  on  S  and  let  q  6  B  (S')  be  nonnegative.  Then 

£f(x)  =  q(x)  [  (f(y)  ~  f  (x))a(x,dy)  (1.8) 

Js 

defines  a  bounded  linear  operator  on  B  (S)  and  C  is  the  generator  of  a  Markov  process 
that  can  be  constructed  as  follows.  Let  {Xn,n  G  N}  be  a  Markov  chain  in  S  with 
transition  probability  a(a;,r),  i.e. 

P  (. Xn+1  €  T|X0,  Xu  ...,Xn)  =  a  {Xn,  T)  (1.9) 

for  all  T  G  B  (S)  and  n  G  N.  Let  ri,  r2, . . .  be  independent  and  exponentially  distrib¬ 
uted  with  mean  1,  and  independent  of  {Xn,  n  G  N}.  Define  a  sojourn  time  S{  for  each 
*  =  1,2, ...  by 

q(Xi-i)  Si  =  n.  (1.10) 

Then 

n  n+1 

X  (t)  =  Xn  for  Z) - t  <  J2 Si 

i=l  i=l 

(with  the  convention  =  0)  defines  a  Markov  process  {X  (t)  ,t  G  [0,  oo)}  with 

infinitesimal  generator  £,  and  we  call  this  process  a  Markov  jump  process. 

A  very  simple  special  case  is  as  follows.  Using  the  notation  above,  assume  S  = 
[0, 1],  q  =  1  and  for  each  i  G  [0, 1],  a  (x,  •)  is  the  uniform  distribution  on  [0, 1].  The 
infinitesimal  generator  C  defined  in  (1.8)  reduces  to 

£/  (s)  =  f  f  (y)  dy-  f  (x) , 

Jo 

which  is  clearly  self-adjoint  with  respect  to  Lebesgue  measure.  If  C  is  the  collection 
of  all  Dirac  measures  on  S,  then  C  is  closed  under  the  topology  of  weak  convergence 
on  V  (S).  Hence  a  large  deviation  upper  bound  would  imply 

limsup  —  log  P  (r/T  6d)<-  inf  I  (p) .  (1.11) 

T— >oo  T  AtSC 
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However,  the  probability  that  the  very  first  exponential  holding  time  is  bigger  than 
T  is  exactly  exp{— T},  and  when  this  happens,  the  empirical  measure  is  a  Dirac 
measure  located  at  some  point  that  is  uniformly  distributed  on  [0, 1].  Hence 

liminf  ^  logP  (rjr  (■)  G  C)  >  liminf  logP  {t\  >  T)  =  — 1. 

In  fact,  we  will  prove  later  that  the  rate  function  for  the  empirical  measure  of  this 
Markov  jump  process  never  exceeds  1.  However,  if  the  upper  bound  held  with  the 
function  defined  in  (1.7),  one  would  have  I  (5a)  =  oo  for  a  G  [0, 1],  and  by  (1.11) 

limsup  —  logP  (■ tjt  (•)  G  D)  =  — oo, 

oo  T 


which  is  impossible. 

This  example  shows  that  this  type  of  Markov  jump  process  is  not  covered  by 
[2,  3].  In  fact,  the  transition  function  P  (t,x,dy)  takes  the  form 

P  (t,  x,  dy )  =  e~tSx  (dy)  +  (l  -  e_t)  1[0)1]  (y)  dy, 

which  means  that  we  cannot  find  a  reference  probability  measure  A  on  5  such  that 
P  (t,x,-)  has  a  density  with  respect  to  A  (•)  for  almost  all  x  G  S  and  t  >  0,  which  is 
a  violation  to  Condition  1.1  used  in  [2,  3],  and  also  violates  the  form  of  reversibility 
needed  for  (1.7). 

A  condition  such  as  Condition  1.1  holds  naturally  for  Markov  processes  that 
possess  a  “diffusive”  term  in  the  dynamics,  which  is  not  the  case  for  Markov  jump 
processes,  and  the  form  of  the  rate  function  given  in  (1.7)  will  not  be  valid  for  these 
processes  either.  The  purpose  of  the  current  paper  is  to  establish  a  large  deviation 
principle  for  the  empirical  measures  of  reversible  Markov  jump  processes,  and  to 
provide  an  explicit  formula  for  the  rate  function  like  the  one  given  in  (1.7).  We 
also  show  why  the  boundedness  of  the  rate  function  results  from  the  fact  that  tilting 
of  the  exponential  holding  times  with  bounded  relative  entropy  cost  can  be  used 
for  target  measures  that  are  not  absolutely  continuous  with  respect  to  the  invariant 
distribution. 

Finally  we  mention  that  [1]  evaluates  (1.2)  for  certain  classes  of  measures  when  £ 
is  the  generator  of  a  jump  Markov  process  satisfying  various  conditions.  However,  it 
does  not  present  an  expression  for  an  arbitrary  measure,  and  indeed  in  appears  that 
the  authors  are  unaware  that  (1.7)  is  not  the  correct  rate  function  for  such  processes, 
or  that  the  large  deviation  principle  had  not  been  established. 

The  paper  is  organized  as  follows.  Section  2  presents  our  assumptions  on  the 
process.  In  Section  3  we  state  the  main  result,  Theorem  3.1.  The  proof  of  Theorem 
3.1  is  divided  into  two  sections,  Section  4  for  the  upper  bound  and  Section  5  for 
the  lower  bound.  In  the  final  section,  we  discuss  the  special  feature  of  Markov  jump 
processes  that  leads  to  the  boundedness  of  the  rate  function. 


4 


2  Assumptions 


Our  first  assumption  is  that  the  Polish  state  space  S  is  compact.  While  compactness 
is  not  needed,  it  lets  us  focus  on  the  novel  features  of  the  problem.  For  standard 
techniques  to  deal  with  the  non-conrpact  case  see,  e.g.,  [3]. 

A  construction  of  Markov  jump  processes  was  given  in  the  Introduction,  and 
we  continue  to  use  the  notation  introduced  there.  The  jump  intensity  q  in  (1.8)  is 
assumed  to  be  continuous  on  S,  and  there  exist  0  <  K\  <  K2  <  00  such  that 


K\  <q(x)<  K2. 


(2.1) 


Reversibility  seems  necessary  to  obtain  an  explicit  formula  for  the  rate  function, 
and  we  will  make  such  an  assumption.  Recall  that  V  is  the  domain  of  C. 


Condition  2.1  £  is  self-adjoint  (or  reversible)  under  7 r  in  the  following  sense:  for 
any  f,g  £  V 


/  (A/  0*0)  g  0*0  7T  (dx)  =  /  (Cg  (x))  f  (x)  7T  (dx) 
Js  Js 


(2.2) 


An  equivalent  condition  for  (2.2)  to  hold  is  the  “detailed  balance”  condition,  i.e., 
for  7r-a.e.  x,y  £  S 


q  (x)  a  (x,  dy )  n  (dx)  =  q(y)a  (y,  dx)  ir  (dy) .  (2.3) 

Note  that  (2.3)  directly  implies  fs  (Cf  (x))  7r  (dx)  =  0  for  all  /eD. 

To  ensure  ergodicity  of  X  (t),  we  need  several  conditions  on  the  transition  func¬ 
tion  a  in  (1.9).  Recall  that  V  ( S )  is  the  metric  space  of  probability  measures  on  S 
equipped  with  Levy-Prohorov  metric,  which  is  compatible  with  the  topology  of  weak 
convergence. 

Condition  2.2  a  satisfies  the  Feller  property.  That  is,  a(x,-)  :  S  1 — >  V  (S)  is 
continuous  in  x. 


Remark  2.3  The  Feller  property  and  the  compactness  of  S  guarantee  a  has  an  in¬ 
variant  distribution  [4,  Proposition  8.3.4],  which  we  denote  by  7f.  The  boundedness 
of  q  enables  us  to  define  a  probability  measure  7 r  according  to 


it  (A) 


Ia  fk71  (cb;) 
fs  (dx) ' 


(2.4) 


Since  n  is  invariant  under  a,  i.e.,  7 r  (•)  =  Js  a  (x,  ■)  7 r  (dx),  we  have 


(Cf  (x))tt  (dx)  = 


fs  fix)71  ( dx )  Js  Js 


[f  ( y )  -  f  (z)]  a  (x,  dy)  7 r  (dx)  =  0. 


By  Echeverria's  Theorem  [6,  Theorem  4-9.1 7],  7 r  is  an  invariant  distribution  of  X  (t). 
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Condition  2.4  a  satisfies  the  following  transitivity  condition.  There  exist  positive 
integers  Iq  and  no  such  that  for  all  x  and  £  in  S 

OO  -  OO  1 

Y  jia{l)  dy ) <  Y  ff]aU)  dy)  > 

i=Zo  j=no 

where  a ^  denotes  the  k-step  transition  probability. 

Remark  2.5  Under  this  condition,  i r  is  the  unique  invariant  distribution  of  a  [4, 
Lemma  8.6.2].  Thus  ir  defined  by  (2-4)  is  the  unique  probability  distribution  that  sat¬ 
isfies  fs  ( Cf  (x))  7 r  ( dx )  =  0,  and  hence  by  [6,  Theorem  4-9.1 7]  is  the  unique  invariant 
distribution  of  X  ( t ) . 

Condition  2.6  There  exists  an  integer  N  and  a  positive  real  number  c  such  that 

(x,  •)  <  C7T  (•) 


for  all  x  G  S. 


Remark  2.7  This  type  of  assumption  is  common  in  the  large  deviation  analysis  of 
empirical  measures.  See  e.g.,  [5,  Hypothesis  1.1]. 

Condition  2.8  The  support  of  it  is  S. 


Remark  2.9  This  condition  guarantees  that  any  probability  measure  g  £  V  (S)  can 
be  approximated  by  measures  that  are  absolutely  continuous  with  respect  to  it.  Indeed, 
given  5  >  0  let  {xj,Nj,  j  =  1 ,...,«/}  be  such  that  J  <  oo,  Xj  G  Nj  G  B  (S),  the  Nj 
are  disjoint,  U^=1iYj-  =  S,  Tr(Nj)  >  0  and  sup  { d(xi ,  y)  :  y  £  Nj}  <  5  for  j  =  1, ...,  J 
(this  can  be  done  by  an  open  covering  argument).  Given  any  rj  G  V  (S)  and  A  6  B  (S) 
let 


J 

ns(A)  =  £ 

3= 1 


tt(A  fi  Nj) 
71 -(Nj) 


v(Nj). 


Then  gs  is  absolutely  continuous  with  respect  to  tt.  Since  ys(Nj)  =  g(Nj)  and 
sup{d(.Ti,  y)  :  y  G  Nj}  <  5,  r f  — >  i]  in  the  weak  topology  as  6  — >  0. 


Remark  2.10  Condition  2.8  excludes  the  existence  of  transient  states.  Although  one 
can  obtain  an  LDP  for  X  ( t )  that  has  transient  states,  one  would  end  up  with  a  rate 
function  that  depends  on  the  initial  state. 


3  A  large  deviation  principle 

3.1  Definition  of  the  rate  function 

In  this  subsection,  we  define  the  rate  function  I.  In  later  sections  we  prove  that  I 
thus  defined  is  the  correct  form  of  the  large  deviation  rate  function  for  the  empirical 
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measures  of  the  Markov  jump  processes.  All  conditions  stated  in  Section  2  will  be 
assumed  throughout  the  rest  of  the  paper.  We  wish  to  study  the  large  deviation 
principle  for  the  empirical  measures  ijrOf  (S)  defined  by  (1.1).  Under  compactness 
of  S  and  Condition  2.2,  t/t  converges  in  distribution  to  an  invariant  distribution  of 
C.  As  pointed  out  in  Remark  2.5,  tt  is  the  unique  invariant  distribution  of  £,  and 
thus  t]t  converges  in  distribution  to  n.  Let  H  be  the  collection  of  all  distributions 
that  are  absolutely  continuous  with  respect  to  tt,  i.e. 

H  =  {geV(S)  (3.1) 

For  £  H,  and  assuming  that  the  integral  is  well  defined,  consider 

—  f  9 1//2  (x)  C  (^O1/2  (x)^  tt  ( dx ) , 

J  s 

where  9  =  drj/dT r.  This  is  a  rewriting  of  ||(— C)l^2g\\2  in  (1.7).  By  inserting  the  form 
of  £  from  (1.8),  we  obtain  the  candidate  rate  function 

I  (r/)  =  /  q  ( x )  r]  (dx)  —  f  91!2  (x)  91^2  ( y )  q  (x)  a  (x,  dy)  tt  (dx) .  (3.2) 

Js  JsxS 

Note  that  by  applying  (2.3)  and  using  the  Cauchy-Schwartz  inequality,  one  can 
prove  that  I  defined  by  (3.2)  is  nonnegative.  Recall  that  K 2  is  the  upper  bound  of  q 
as  in  (2.1),  and  thus  I  is  bounded  above  by  K2.  In  addition,  it  is  straightforward  to 
show  that  /  is  convex  on  H. 

We  want  to  extend  the  definition  of  I  to  all  measures  in  V  (S).  As  pointed 
out  in  Remark  2.9,  H  is  dense  in  V  (S)  under  the  topology  of  weak  convergence. 
Hence  we  can  extend  the  definition  of  I  to  all  of  V  (S)  via  lower  semicontinuous 
regularization  with  respect  to  the  topology  of  weak  convergence.  Thus  if  rjn  — >  y 
weakly  and  {r/n}  £  H,  limin^^oo  /  (r]n)  >  I(rj),  and  equality  holds  for  at  least 
one  such  sequence.  This  extension  guarantees  that  the  extended  I  is  convex,  lower 
semicontinuous  and  bounded  above  by  I\2  on  all  of  V  (S).  The  compactness  of  S 
and  the  lower  semicontinuity  of  I  ensure  that  I  has  compact  level  sets.  Being  a 
nonnegative,  lower  semicontinuous  function  with  compact  level  sets,  I  indeed  is  a 
valid  large  deviation  rate  function. 

We  have  finished  the  definition  of  the  rate  function  I,  and  are  now  ready  to  state 
the  large  deviation  principle. 

3.2  A  large  deviation  principle 

Our  main  result  is  the  following: 

Theorem  3.1  Let  X  (t)  be  a  Markov  jump  process  satisfying  all  the  assumptions  in 
Section  2.  Let  I  be  defined  as  in  Section  3.1.  Then  the  large  deviation  bounds  (1-4) 
and  (1.5)  hold. 
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To  prove  Theorem  3.1,  it  suffices  to  show  the  equivalent  Laplace  principle  [4, 
Theorem  1.2.3].  Specifically,  we  establish  that  for  any  bounded  continuous  function 
F  :  V  {S)  -*•  R 

lim  ~  \ogE  [exp{-TF  (??T)}]  =  inf  [F  (r/)  +  I  (77)] .  (3.3) 

T— >00  1  n&v{S) 

By  adding  a  constant  to  both  sides  of  (3.3)  we  can  assume  F  >  0  and  do  that  for  the 
rest  of  the  paper.  The  proof  is  based  on  a  weak  convergence  approach  and  is  split 
into  two  parts:  a  Laplace  upper  bound  and  a  Laplace  lower  bound. 

Relative  entropy  plays  a  key  role  in  the  proof,  and  we  hence  state  the  definition 
and  a  few  important  properties.  Details  can  be  found  in  [4], 

Definition  1  Let  (V,A)  be  a  measurable  space.  Fort)  6  V  (V),  the  relative  entropy 
R  (•  (I??)  is  a  mapping  from  V  (V)  into  the  extended  real  numbers.  It  is  defined  by 

Rh\W)  = 

when  7  G  V  (V)  is  absolutely  continuous  with  respect  to  1)  and  log  dj/dd  is  integrable 
with  respect  to  7.  Otherwise  we  set  R( 7  ||i?)  =  00. 

If  V  is  a  Polish  space  and  A  the  associated  u-algebra,  then  R  (■  ||-)  is  nonnegative, 
jointly  convex  and  jointly  lower  semicontinuous  (with  respect  to  the  weak  topology 
on  V  (V)2).  We  state  the  following  two  properties  of  relative  entropy. 

Lemma  3.2  (Variational  formula)  Let  (V,»4)  be  a  measurable  space,  k  a  bounded 
measurable  function  mapping  V  into  M,  and  1)  a  probability  measure  on  V.  The 
following  conclusions  hold. 

(a)  We  have  the  variational  formula 

—  log  [  e~kdi)  =  inf  <R  (7  ||i?)  +  f  kd'y 1.  (3.4) 

JV  7G'P(V)  [  Jv  ) 

(b)  The  infimum  in  (3.f)  is  attained  uniquely  at  70  defined  by 

Theorem  3.3  (Chain  ride)  Let  X  and  y  be  Polish  spaces  and  f3  and  7  probability 
measures  on  X  x  y.  We  denote  by  \fi\\  and  [7]  1  the  first  marginals  of  and  7  and 
by  (3  (dy\x)  and  7  (dy\x)  the  stochastic  kernels  on  y  given  X  for  which  we  have  the 
decompositions 

f3  (dx  x  dy)  =  [/3]i  ( dx )  <S>  fi  ( dy\dx )  and  7  (dx  x  dy)  =  [7]  1  (dx)  <8>  7  (dy\dx) . 
Then  the  function  mapping  x  £  X  — >  R  (/?  (jx)  ||7(-|x))  is  measurable  and 

R(Ph)  =  R([Ph  II Mi)  +  [  R(P(-\x)\h(-\x))[P]i(dx) . 

J  x 

We  devote  the  next  two  sections  to  proving  the  Laplace  upper  bound  and  the 
Laplace  lower  bound,  respectively. 
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4  Proof  of  the  Laplace  upper  bound 

In  this  section,  we  prove  the  Laplace  upper  bound  part  of  (3.3),  i.e. 

liminf  —  —  log-E  [exp{— TF  (rfr)}]  >  inf  [F  (rj)  +  /(??)]  . 
T— >oo  T 


(4.1) 


Recalling  the  construction  of  X (t)  in  the  Introduction,  we  define  a  random  integer 
Rt  as  the  index  when  the  total  “waiting  time”  first  exceeds  T,  i.e. 


Rt — 1  Rt 

E  Si<T  <J2Si. 
2=  1  2=1 

Then  the  empirical  measure  ijt  can  be  written  as 

i -T 


(4.2) 


VT  ^  =  T  j0  Sx^  (■) dt 


l 

T 


Rt — 1  /  Rt  ~  1 

E  dxi-l  (')  Si  +  (•)  \T—  E  • 

L  2=  1  V  2=1 

Rt—  1 


E  5*<-i  (■ 


Ti 


Rt—  1 


2=1 


q(Xi-i) 


+  SxRt_A-)  t-  E 


1), 


(4.3) 


The  proof  of  (4.1)  will  be  partitioned  into  two  cases:  Rt/T  >  C  and  0  <  Rt/T  <  C, 
where  C  will  be  sent  to  oo  after  sending  T  — >  oo. 


4.1  The  case  Rt/T  >  C 

Let  F  :  V(S)  — >  M  be  nonnegative  and  continuous.  Then  since  F  >  0, 

f  ITCJ+l 

--log^[l{{C)oo)}(^T/T)e-TFM  >--logP^  E  S^T 


f  [tcj+i 


=  --iogp<j  e 


Ti 


<  T 


( [TCJ+1  \ 

>-E°gpj  E  n<K2T\. 


Using  Chebyshev’s  inequality,  for  any  a  £  (0,  oo) 


{ITCJ+l  1 

E  n  <  k2 t  >  =  p  |e-a^=iCJ+1 


=i  +1  n  >  e~aK2T'\ 


<  e 


aK2T 


E 


e-«E  [=fJ+1T 


_  S.  LTCJ  +1)  log  U 


=  e 


i+a 
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For  the  last  equality  we  have  used  that  if  r  is  exponentially  distributed  with  mean  1 
then  Eear  =  1/  (1  —  a)  for  any  a  €  (— oo,  1).  Combining  the  last  two  inequalities, 

!™mf  -  ^  log E  [1  {{Ct0o)}(RT/T)  •  exp {-TF(rfr)}] 

>  sup  [— K2a  +  Clog  (1  +  a)] 

a£(0,oo) 

=  -C  +  C  log  C  +  K2  -  C  log  K2. 

Note  that  —  C  +  C  log  C  +  K2  —  C  log  K2  — >  00  as  C  — ■>  00. 

4.2  The  case  0  <  Rt/T  <  C 
4.2.1  A  stochastic  control  representation 

In  this  case  we  adapt  a  standard  weak  convergence  argument,  see  [4]  for  details. 
Specifically,  we  first  establish  a  stochastic  control  representation  for  the  left  hand 
side  of  (3.3)  and  then  obtain  a  lower  bound  for  the  limit  as  T  — >  00.  In  the  rep¬ 
resentation,  all  distributions  can  be  perturbed  from  their  original  form,  but  such 
a  perturbation  pays  a  relative  entropy  cost.  We  distinguish  the  new  distributions 
and  random  variables  by  an  overbar.  In  the  following,  the  barred  quantities  are 
constructed  analogously  to  their  unbarred  counterparts.  Hence  t \  and  X%  are  chosen 
recursively  according  to  stochastic  kernels  <7*  (•)  and  a-L  (•),  i.e. ,  cr%  (•)  and  a%  (•)  are  con¬ 
ditional  distributions  that  can  depend  on  the  whole  past.  Specifically,  <7*  (•)  depends 
on  {X0,  fi,  Xi,  f2, . . . ,  1}  and  <5;  (•)  depends  on  {X0,  f1,X1,f2  ■  ■  ■ ,  W_i,fj};  st  is 

defined  by  (1.10)  using  Xi  and  r*;  Rt  is  defined  by  (4.2)  using  sf,  and  fjr  is  defined  by 
(4.3)  using  Xi,  fi  and  Rt-  It  will  be  sufficient  to  consider  any  deterministic  sequence 
{rx}  such  that  0  <  rx/T  <  C ,  and  tt/T  — >  A  for  some  A  €  [0,(7]  as  T  — ►  00.  We 
restrict  consideration  to  controlled  processes  such  that  Rt  =  tt  by  placing  an  infinite 
cost  penalty  on  controls  which  lead  to  any  other  outcome  with  positive  probability. 
Let  1  (A)  denote  the  indicator  function  of  a  set  A,  and  recall  that  our  convention  is 
0  •  00  =  0.  By  applying  [4,  Proposition  4.5.1]  and  Theorem  3.3  the  following  is  valid: 

-  ^  log  E  [exp  {-TF(r]T)  -  T  ■  00  •  1  (rT  ^  4?T)}]  (4.4) 

(( rT~  1  rT  ]  c 

(( rT- 1  rT  \  c\  rT 

l  ^  T  <  f  \\a)  +  R(vi  Ik)] 

l  i= 1  i=l  J  /  i= 1 

(4.5) 

where  the  infimum  is  taken  over  all  control  measures  {dj,<7j}-  Since  in  Section  5  we 
will  prove  a  similar  but  more  involved  representation  formula,  Lemma  5.1,  we  omit 
the  proof  of  this  representation.  Due  to  the  restriction  Rt  =  tt ,  one  can  write  fjr  as 


=  “log  E  exp  < 
=  inf  E  F  (fjT)  + 
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V T  (•)  =  y 


Tt~  1 

X]  ^i-1  (' 

L  2=1 


T?; 


r^— 1 


q  {Xi- 1) 


+  5xrT_i  (0  T  ~  X 


(4.6) 


In  the  following  proof,  we  repeatedly  extract  further  subsequences  of  T.  To  keep 
the  notation  concise,  we  abuse  notation  and  use  T  to  denote  all  subsequences.  Note 
also  that  in  proving  a  lower  bound  for  (4.4)  it  suffices  to  consider  a  subsequence  of 
T  such  that 


sup  — —  log  E  [exp  {— TF(rfr)  —  T  ■  oo  ■  1  (tt  /  Rt)}]  <  oo.  (4.7) 

T  T 


We  assume  this  condition  for  the  rest  of  this  subsection. 

The  relative  entropy  cost  in  (4.5)  includes  two  parts,  RE^  =  4,  Y^i=\  R  (ck»_i  || ck  ) 
and  RE ^  ^  Ya= i  R  (t;  ||<t).  We  will  prove  that  for  any  sequence  of  controls  {dj,  cf*} 
in  (4.5) 

lim  inf  E  [F  (■ fjT )  +  RE\  +  RE%]  >  inf  [F  (77)  +  I  (??)]  .  (4.8) 

T—>  00  rjG'P(S) 

Toward  this  end,  it  is  enough  to  show  that  along  any  subsequence  of  T  such  that 
rx/T  — >  A,  we  can  extract  a  further  subsequence  along  which  (4.8)  holds.  In  addition, 
it  suffices  to  consider  only  functions  F  that  besides  being  nonnegative,  are  also  lower 
semicontinuous  and  convex.  This  restriction  is  valid  since  I  is  convex  and  lower 
semicontinuous,  and  follows  a  standard  argument  in  the  large  deviation  literature. 
The  interested  reader  can  find  the  details  in  [8]. 

In  light  of  (4.5)  and  (4.7)  we  assume  without  loss  of  generality 

sup  E  [F  (fir)  +  REt  +  REj>]  <  00.  (4.9) 

T 


Since  the  proof  of  (4.8)  is  lengthy,  we  analyze  each  term  on  the  left  hand  side  of  (4.8) 
separately  in  the  following  subsections. 


4.2.2  The  term  RE ^ 

The  cost  RE^  comes  from  distorting  the  dynamics  of  the  embedded  Markov  chain, 
and  indeed  the  analysis  gives  a  very  similar  conclusion  to  that  of  an  ordinary  Markov 
chain  ([4,  Chapter  8]).  For  any  probability  measure  v  on  S  x  S  we  will  use  notations 
[//]  and  [u\2  to  denote  the  first  and  second  marginals  of  u.  We  have  the  following 
result  for  RE 

Lemma  4.1  Consider  any  sequence  of  controls  {on,  <7*}  in  (4-5)  such  that  (4-9)  holds. 
Along  any  subsequence  of  T  satisfying  rx/T  — >  A,  define  a  sequence  of  random  prob¬ 
ability  measures  on  S  x  S  via 

^  rT 

pT  (dx,  dy)  =  —  X  ( dx )  1  (dy) . 
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Then  one  can  extract  a  further  subsequence  such  that  E/it  converges  in  distribution 
to  a  probability  measure  fi  on  S  x  S,  and 

lim  inf  E  >  AR  {fi  ||  [fi\1  <S>  a ) . 

Furthermore,  if  A  >  0  then  fi  satisfies 

[fi]  1  =  [fi] 2  •  (4-10) 

Proof.  By  the  chain  rule  (Theorem  3.3)  and  the  joint  convexity  of  relative  entropy 

-  rr 

E[RE']=E  ll«) 

i—  1 

"  i  rT 

=  E  -  R  (sXi-!  ®  «i-i  5Xi_  1®«) 

i= 1 

~TT 

>  E  [yr^t  ®  «) 

Tt 

>  yR(Eet  II Mi  ®  «)  • 

Since  S'  x  5  is  compact,  for  any  subsequence  of  T  there  exists  a  further  subsequence 
along  which  Eyx  converges  weakly  to  a  probability  measure  fi.  Under  the  Feller 
property  of  a  (Condition  2.2),  [E^t\1  <8>  a  converges  weakly  to  [/i]  1  <8>  a.  The  lower 
semicontinuity  of  relative  entropy  then  implies 

lim  inf  E  [iiEy]  >  lim  inf  f^R  {E/j,t  ||[E//t]i  ®  a)  >  AR  {fi  ||[/i]1  ®  a) . 

This  finishes  the  first  part  of  Lemma  4.1.  For  the  second  part,  we  employ  a 
standard  martingale  argument.  Let  T%  be  the  u-algebra  generated  by  the  random 
variables  {  (Xq,  . . . ,  Xf)  ,  (n, . . . ,  f*)}.  Thus  J~t  is  a  sequence  of  increasing  a-algebra’s 
and,  since  on  selects  the  conditional  distribution  of  Xj,  for  any  bounded  continuous 
function  f  on  S 

E  JfMaiidy)^  1  =0. 

Hence  for  integers  0<i<k<rT~  1 

E  (/  (X*)  -  j  f  (y)  at  (dy)^  (j  (Xk)  -  j  f  {y)  ak  (dy))  =  0, 
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and  thus  for  any  bounded  continuous  function  f  on  S 


E 


'SxS 


f  (x)  \iT  (dx,  dy)  -  f  (y)  yT  ( dx ,  dy) 


=  E 


'SxS 

rT 


^  E  J  f  M  w 

J  i=l  i=l  ^ 

'f'rp  _  O 

-  ^  E  f  (^-0  -  J  f(v)  «i- 1  (dy) 


< 


'T  i=i 

^imiL 


Since  0  <  T  =  lim7’_+00  ry/T,  we  have  rx/T  >  A/ 2  for  all  T  large  enough.  Using 
Chebyshev’s  inequality  and  the  last  display  we  conclude  that  [ht\  i  —  [/it] 2  converges 
to  0  in  probability  as  T  — >  00,  and  therefore  [/i]1  =  [p]2  with  probability  1.  This 
concludes  the  second  part  of  Lemma  4.1.  ■ 


4.2.3  The  term  RE |> 

We  now  turn  to  the  second  cost  RE j,.  This  cost  comes  from  distorting  the  exponential 
sojourn  times.  We  introduce  a  function  £  which  is  closely  related  to  the  relative 
entropy  of  exponential  distributions:  £  (x)  =  x  log  x  —  x  +  1  for  any  x  >  0. 

Lemma  4.2  Given  any  sequence  of  controls  {at,  a i},  fix  a  subsequence  ofT  for  which 
the  conclusions  in  Lemma  f.l  holds.  Then  we  can  extract  a  further  subsequence  along 
which 

liminf  E  [. RE x]  >  /  £  (u)  £  (dx,  du) . 

T^°°  dSxR+ 

idere  £  is  a  finite  measure  on  S  x  M+  and  is  related  to  fi  in  Lemma  f.l  by 


u£  (dx,  du)  =  A  [fi\1  (dx) . 


(4.11) 


Before  proving  this  lemma,  we  define  g  :  R+  — ►  M  by  g  (b)  =  —  log  b  +  b  —  1.  The 
functions  g  and  l  are  related  by 


g(x)  =x£(l/x), 


and  g  has  the  following  property. 

Lemma  4.3  Let  a  be  an  exponential  distribution  with  mean  1.  Then 

inf  (7  || cr )  :  J  u'y  (du)  =  &j>  =  g  (b) .  (4.12) 
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Proof.  Let  07,  be  the  exponential  distribution  with  mean  b,  i.e., 

/  1  _H 

a;,  (du)  =  -e  b  du. 

Then  ^  (u)  =  fe(1_b)w  for  u  >  0.  Picking  any  7  such  that  R^Wa)  <  00  and 
/ffi+  «7  (<*«)  = 


^(7  Ik) 


7  (du) 


^(7lkb)+  [ 

J  R+ 


-  log  6  + 


7  (du) 


=  d?  (7  Ikft)  +  5  (&) 

>g(b) 


and  the  infimum  in  (4.12)  is  achieved  when  R  (7  ||<t&)  =  0,  i.e.,  7  =  a m 
Proof  of  Lemma  4.2.  Lemma  4.3  guarantees  that 

REt  >  fYl9  (/ u^i(du)j  .  (4.13) 

Recall  the  definition  of  J~i  as  the  u-algebra  generated  by  the  controlled  process  up  to 
time  i.  Since  <77  selects  the  conditional  distribution  of  77, 


E  [n\Fi-i\ 


j  v,(Ji  (du) . 


Define  mt  =  f  udi  (du),  for  i  =  1, . . . ,  tt  —  1.  The  definition  of  mrT  requires  more 
work.  Recalling  the  definition  of  Rt  by  the  equation  analogous  to  (4.2)  and  the 
restriction  that  Rt  =  t't- 


Tt~  1 


Ti 


< 


I  frjp 


U  q{Xi-i)  -qix^y 

Multiplying  both  sides  by  q  ( XrT_i )  and  taking  expectation  conditioned  on  d>T_i, 

/  rT~  1  =  \  r 


(M-i)  r-  53 


<  L1  [rrT|.7>;r_l]  =  /  7i<7rT  (du) . 


Define 


rT~  1 


At  =  q  (Xrr_!)  T  -  ]T 


Ti 


(4.14) 
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and  define  mrT  by 


mrT 


f  uarT  (dn)  if  A t  <  f  uarT  ( du )  <  1 
1  if  A t  <  1  <  f  udrT  (du) 

At  if  1  <  A t  A  f  uarT  (du) 


(4.15) 


i.e. ,  mrT  is  the  median  of  the  triplet  (At,  f  uarT  (du) ,  l).  Since  g  is  increasing  on 
(1,  oo),  we  have  g  (f  udrT  ( du ))  >  g  (fhrT)  in  all  three  cases.  Thus  by  (4.13), 


ret  >  h  9  (  f  UGi  (du)  J  >  1=  9  (rn)  • 

4=1  '  1  4=1 

Next  consider  the  measure  on  S  x  R+  defined  by 

^  rT 

Zt  (dx,  du)  =  6Xi-i  ( dx )  \ffii)-1  ( du )  ™i- 


The  total  mass  of  E^t  is 


i— 1 


rT 

EZt  (S  x  M_|_)  =  —  E  [fhi] . 

i= 1 

According  to  (4.9)  and  the  assumption  that  F  >  0,  we  have 

sup  E  [-R-Efi]  <  oo. 

T 


(4.16) 


(4.17) 


(4.18) 


By  (4.16)  supT  E  E[=i  9  /E]  <  oo.  We  also  have  by  a  straightforward  calcula¬ 

tion  that  x  <  max  {50, 10 g(x)  / 9} .  Using  this  and  the  fact  that  tt/T  <  C  we  have 
supT  E  Ei=i  irii/T\  <  oo,  i.e.,  the  total  mass  of  E^t  has  a  bound  uniform  in  T. 
Thus  when  viewed  as  a  sequence  of  measures  on  the  compact  space  S  x  [0,oo],  E^t 
is  tight  due  to  the  uniform  boundedness  of  the  total  mass.  We  denote  the  weak  limit 
by  £,  which  is  a  finite  measure.  Since  the  function  d  is  nonnegative  and  continuous, 


lim  inf  E  [REt\  >  lim  inf  E 

T — xx)  T — xx) 


rT 

~^2g(fhi) 

4=1 


=  lim  inf  E 

T— xx) 

=  lim  inf  /  l  (u)  E£t  (dx,  du) 

T^°°  JSx R+ 

>  /  i  (u)  £  (dx,  du) . 

J  SxR+ 


'  Sxl 


i  (u)  £t  (dx,  du) 


(4.19) 


We  next  explore  the  relation  between  £  and  fi.  In  order  to  establish  (4.11),  it 
suffices  to  show  that  for  any  bounded  continuous  function  f  on  S 


/  uf  (x)  |  (dx,  du)  =  A  f  (x)  [p]1  (dx) 
'  SxR+  JS 
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By  the  definitions  of  and  ht 


I  uf  (x)  E£t  {dx,  du)  =  ^  /  f(x)  [E/j,t}1  (dx) 
JSx R+  1  JS 

Then  (4.18)  and  (4.19)  imply  there  is  a  uniform  upper  bound  on 


(4.20) 


/  f  (x)  EiT  (dx,du) . 

J  R+  J  S 

If  we  consider  fs  f  (. x )  E£t  ( dx ,  du)  as  a  sequence  of  measures  on  M+  with  bounded 
total  mass,  then  fs  f  (x)  E£t  ( dx ,  du)  converges  weakly  to  fs  f  (x)  £  (dx,  du).  Since  i 
is  superlinear,  [4,  Theorem  A. 3. 19]  implies  that 


lim 

T— >oo 


5x1 


uf  (x)  E£t  (dx,  du)  = 


ISx  R+ 


uf  (x) £ (dx, du) 


Using 

lim  Sr  [  f(x)  Wi  (d®)  =  A[  f(x)  [A]i  (d®) 

T^oo  1  ./.S' 

and  (4.20)  we  arrive  at  (4.11).  ■ 

4.2.4  The  term  Efjx 


Lemma  4.4  Given  any  sequence  of  controls  {<5*,  <7j},  fix  a  subsequence  ofT  for  which 
the  conclusions  in  Lemma  4-2  hold.  Then  we  can  extract  a  further  subsequence  along 
which 

lim  inf  E  [F  (i }t)]  >  F  (fj) 

T — >oo 

for  some  probability  measure  fj  on  S,  which  is  related  to  f  in  Lemma  4-%  by 

q  (x)  fj  (dx)  =  [f]i  (dx) .  (4.21) 

Proof.  As  a  sequence  of  probability  measures  on  the  compact  space  S,  we  can  always 
extract  a  subsequence  of  T  such  that  Efjr  converges  weakly  to  a  probability  measure 
on  S  which  we  denote  by  fj.  The  convexity  and  lower  semicontinuity  of  F  imply  that 

lim  inf  E  [F  (//t)]  >  lim  inf  F  (Efjr)  >  F  (fj) . 

T — >oo  T — >oo 
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By  the  definitions  of  t)t  in  (4.6)  and  in  (4.14) 
q  (x)  ET1t  ( dx ) 
q{x ) 


T 


-E 


■x 


q  (Xi-i) 


Vt~  1 

£  '**<-,  (d 

.  i=  1 

Tt  —  1 

SXi- 1  +  5xrT_i  (<&)  at 

L  i—  1 

rj1  —  1 


/  ^T  — 1 

+  «x„-,(d*)  r-  Y, 


i= 1 


« (^-i) 


T 

1 

T 


\  i=  1 

^  r^p  — 1 


E  E  Sx._1(dx)n \Fi-i  +E  Sx  (dx)AT 


i= 1 


X]  E  SXi- 1  (dx)  mi  +  ^  ,  (<&)  AT 


Recalling  the  definition  of  in  (4.17),  we  have 


rT 


[^r]i  (dx)  =  -  E  ^Xi_!  ( dx )  rm 


7=1 


This  implies  the  total  variation  bound 


\\q(x)ErjT  (dx)  -  [E£T]  1  (dx)||TV  <  fE\A T  ~  mrT\  . 
Recalling  the  definition  of  fhrT  in  (4.15)  we  conclude  that 


I  q  (x)  Etjt  (dx)  -  [E£t]  i  (dx)||TV  < 


By  taking  limits  we  arrive  at  (4.21).  ■ 

Lemma  4.1,  Lemma  4.2  and  Lemma  4.4  together  imply  for  a  sequence  of  controls 
satisfying  (4.5),  along  any  subsequence  of  T  such  that  rr/T  — ►  A,  we  can 
extract  a  further  subsequence  along  which 


lim  inf  E  [F  (tjt)  +  RE?  +  RE^\  >  F  (fj)  +  AR  (ft  ||  [jj]t  a )  +  /  l  (u)  £  (dx,  du) 

T^°°  Jsx  K+ 

(4.22) 

where  fj,  jl  and  £  satisfy  the  constraints  (4.11),  (4.21),  and  (4.10)  if  A  >  0. 

Recall  that  our  goal  is  to  prove  (4.8).  Hence  we  need  to  establish  the  relationship 
between  the  right  hand  side  of  (4.22)  and  the  rate  function  I  defined  in  Section  3.1. 


4.2.5  Properties  of  the  rate  function  I 

We  prove  the  following  lemma,  for  which  we  adopt  the  convention  0  •  oo  =  0.  This  is 
in  fact  the  key  link,  showing  that  the  rate  function  that  is  naturally  obtained  by  the 
weak  convergence  analysis  used  to  prove  the  upper  bound  in  fact  equals  I  for  suitable 
measures,  and  also  indicating  how  to  construct  controls  to  prove  the  lower  bound  for 
this  same  collection  of  measures.  Note  that  the  constraints  appearing  in  the  lemma 
hold  for  the  subsequence  appearing  in  (4.22)  due  to  Lemmas  4.1,  4.2  and  4.4. 
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Lemma  4.5  Let  I  (rf)  be  defined  by  (3.2).  Suppose  that  r)  <C  it,  that  p  and  f  satisfy 
the  constraints 


q  (x)  7]  ( dx )  =  [£]i  ( dx )  and  /  (dx,  du )  =  A  [/i^  (drc) , 


(4.23) 


and  that  when  A  >  0  t/ie  constraint  [/i]1  =  [/ i]2  is  also  true.  T/ien 


I  (v)  <  2IR  (//  ||[^i]1  (g>  a)  +  /  £  (u)  £,  (dx ,  du) . 

JSx  R+ 


(4.24) 


Moreover, 


I  ( 77 )  =  inf 


AR  (/i  ||[//]1  (g)  a)  +  /  £  (it)  £  (dx ,  du) 

J  Sx  R+ 


where  the  infimum  is  over  all  possible  choices  of  A  >  0,  p  and  f  satisfying  these 
constraints. 


The  proof  of  this  lemma  is  detailed.  The  reason  we  present  it  here  instead  of  in 
an  appendix  is  the  previously  mentioned  fact  that  the  construction  of  A.  p  and  £ 
that  minimize  the  right  hand  side  of  (4.24)  indicates  how  to  hit  target  measures  y 
that  are  absolutely  continuous  with  respect  to  the  invariant  measure  in  the  proof  of 
the  Laplace  lower  bound. 

Proof.  We  first  prove  the  inequality  (4.24).  If  the  right  hand  side  of  (4.24)  is  00, 
there  is  nothing  to  prove.  Hence  we  assume  it  is  finite.  First  assume  A  >  0,  in  which 
case  R  (p  ||  <S>  a)  <  00.  Define 


so  that  by  (2.4) 


q (x)  7 r (dx) 


7 r  (dx)  =  q  (x)  it  (dx)  /Q. 


(4.25) 

(4.26) 


Since  7 f  is  invariant  under  a ,  by  [4,  Lemma  8.6.2]  [p]1  <C  7 f.  By  (2.1)  q  is  bounded  from 
below,  and  hence  [p] ,  <C  7r.  Recall  that  the  definition  of  I  in  (3.2)  uses  6  =  dy/dir. 
Define  0  =  {a:  G  S  :  6  (x)  =  0}.  By  (4.23) 


Mi  (dx) 


Ir+  <2|1  (du\x) 

A 

Ir+  <2|1  (du\x) 

A 

Ir+  <2|1  (du\x) 

A 


[£]i  (dx) 


q  (x)  r]  (dx) 


q  (x)  0  (x)  7 r  (dx) 


(4.27) 


where  for  a  measure  u  on  S  x  M+,  z^2|i  denotes  the  regular  conditional  distribution 
on  the  second  argument  given  the  first.  Thus  [p]1  (0)  =  0.  Now  suppose  that 


(M2  (x)  (M2  (y)  q  (x)  a  (x,  dy)  n  (dx)  =  0. 


18 


22 


Then  for  7r-a.e.  x  £  S\@,  a(x,0)  =  1,  and  hence  (ju i  (g>  a)[(5\0)  x  0]  =  1.  On 
the  other  hand,  y,{{S\Q)  x  0)  =  0  due  to  [n]1  =  [n\2.  This  violates  the  fact  that 
R  (/x  ||  [/x],  (g>  a)  <  oo.  We  conclude  that 


J  SxS 

Lemma  3.2  implies  that 


9 l/2  (x)  9 1/2  (y)  y  (x)  a  (x,  dy )  7r  (dx)  >  0. 


—  log  f  61/2  (x)  9 l/2  { y )  a  (x,  dy)  i r  (dx) 

J  SxS 

=  —  log  f  e-2[lo^x)+lo^y)]a  (x,  dy)  5r  (dx) 
J  SxS 


<  R(n\\ir®a)  -  - 


i  SxS 


[log  0  (x)  +  log  0  (y)]  [i  {dx,  dy) . 


(4.28) 


Strictly  speaking,  the  inequality  above  does  not  fall  into  the  framework  of  Lemma  3.2 
because  log  9  is  not  bounded.  However,  if  one  goes  through  the  proof  of  this  lemma 
[4,  Proposition  1.4.2],  then  the  above  inequality  is  true  as  long  as  the  right  hand  side 
is  not  of  the  form  oo  —  oo.  Towards  this  end,  it  suffices  to  prove 


'SxS 


[log0(x)  +  log0(y)]/x(dx,dy)  =  /  log  9  (x)  [/u]1  (dx)  <  oo. 


(4.29) 


In  the  appendix  we  will  prove  [this  being  the  only  place  where  Condition  2.6  is  used] 
that 

R  (Mi  H^f)  <  °°.  (4.30) 

For  now,  we  assume  this  is  true.  Using  (4.25),  (4.26),  and  (4.27)  to  evaluate  the 
relative  entropy, 


oo  >  Rilph  ||vr)  =  ^  log  <2|i  (du\x) j  Mi  {dx)  +  log  9  (x)  [/x]x  (dx)  +  log^. 

_  y  (4.3i) 

We  know  from  (2.1)  that  Q  >  K\.  Also,  by  (4.23)  and  the  nonnegativity  of  i 


/  log!  /  <2|i  (du\x) )  [//]i  (dx) 

Js  \J R+  J 

=  2  is  (X  ^2|1  log  (j£  u^211  ^ 

=  4  f  <2|1  (dtilx)'j  [^  (dx)  +  -J  f  <  {dx,  du)  ~  \  f  [C]i  {dx) 

J  S  J  J  5xR+  J  S 

=  4  f  t(  f  <2|1  (dulx)'j  [C]i  (dx)  +  /  Mi  (dx)  -  y  f  q{x)rj  {dx)  (4.32) 
J  S  \JR+  J  J  S  J  S 

>  i  - 
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the  second  constraint  in  (4.23)  is  vised  for  the  first  equality;  the  definition  of  £  gives 
the  second  equality;  both  parts  of  (4.23)  assure  the  third  equality;  finally  the  non¬ 
negativity  of  £  is  used.  Thus  rearranging  (4.31)  gives  (4.29). 

The  chain  rule  of  relative  entropy  gives 

R(y\\ir®a)-]-  [  [log  9  (x)  +  log  9  (y)]  y  (dx,  dy ) 

z  JSxS 

=  R([/j]i\\tt)  +  /  R  (/u-211  ||a)  Mi  (dx)  ~  /  log  9  (x)  [y^  {dx) 

Js  Js 

=  -R  (Mi  11^)  +  Riv  IIMi  ®  a)  -  [  log e(x)[y]1(dx).  (4.33) 

Js 

By  (4.31)  and  (4.32)  and  the  convexity  of  l 


R(\p] ilk)-  /  log 9(x)[y]1(dx) 

Js 


=  /  log 
Js 


(  /  <2|i  (d 

VdR+ 


lu\x)  )  Mi  (dx)  +  log  ^ 


1 

-  A 


x  [  \  [  U^1  ^  ^  +  /  ^  ( dx )  (dx)  +  lQg 

J  S  M_|-  /  «/  S'  J  S 


Q 

A 


'SxR+ 


If  Q 

£  ( u )  £  (dx,  dw)  +  1  — -  /  q(x)r]  (dx)  +  log  — . 

A  Js  A 


(4.34) 


In  summary  (4.28),  (4.33)  and  (4.34)  imply 

—  log  f  9 1/2  (x)  d1/2  (y)  q  (x)  a  (x,  dy)  it  (dx) 

JsxS 

=  —  log  f  d1/2  (x)  d1/2  (y)  a  (x,  dy)  7 f  (dx)  —  log  Q 
JsxS 

<R(y  ||  Mi  ®  «)  +  4  /  ^  («)  £  (dx,  dw)  +  1  -  4  [  q  (x)  rj  (dx)  +  log  4* 

A  JSx R+  ^  ds  A 

Thus 


i  SxS 


d1/2  (x)  d1/2  (y)  q  (x)  a  (x,  dy)  7 r  (dx) 


<  -  exp  |-  (y  ||[/i]1  ®a)  +  \fs  r  l  (M)  £  (d-T  du)  +  1  -  4  j ^  9  (%)  V  (dx)  +  log  4 


(4.24)  then  follows  from  the  fact  that  — e  r  <  ar  +  a  log  a  —  a  for  any  r  £  R  and 
a  6  M+  by  taking  a  =  A  and 

r  =  JR(/x||[ju]1<g)a)  +  -  f  £  (u)  £  (dx,  du)  +  1  -  4  f  9  (x)  y  (dx)  +  log  — . 

A  dsxR+  ^  ds  ^ 
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For  the  case  when  A  =  0,  (4.23)  implies  that  fR+  u£  ( dx ,  du )  =  0,  which  means  that 
fR+  2|i  ( du\x )  =  0  [^]1-a.e.  Hence  by  the  convexity  of  i  and  q  ( x )  y  (dx)  =  [£]i  (dx) 


'Sxl 


^  (u)  £  (dx ,  du)  >  (  (-[  f  u^2\i  (du\x)  J  [£]x  (dx) 

Js  \J R+  ~  / 

=  [  [£li(d®) 

Js 


=  q  (x)  y  (dx) 

Js 

>  f  q  (x)  rj  (dx)  —  f  81/2  (x)  0 1//2  (y)  q  (x)  a  ( x ,  dy)  i r  (dx) 
Js  JsxS 

=  I  (ri). 


Thus  (4.24)  also  holds  in  this  case,  and  completes  the  proof  of  the  first  part  of  Lemma 
4.5. 

We  now  turn  to  the  second  part  of  Lemma  4.5.  The  definitions  and  constructions 
used  here  will  also  be  used  to  construct  what  are  essentially  optimal  controls  to 
prove  the  reverse  inequality  in  the  next  section,  and  indeed  the  particular  forms  of 
the  definitions  are  suggested  by  that  use.  In  particular,  An(x)  will  correspond  to  a 
dilation  of  the  mean  for  the  exponential  random  variables.  In  light  of  the  second  part 
of  Lemma  3.2,  we  define  y,  by 

,  f-'t  '  (x,  y)  =  01/2  (x)  91/2  (y)  /  f  61/2  (x)  81/2  (y)  (It  <g>  a)  (dx,  dy)  .  (4.35) 

d(ir  ®  a)  /  JSxS 

Note  that  by  the  Cauchy-Schwartz  inequality,  the  detailed  balance  condition  (2.3) 
and  the  relation  between  it  and  n  (see  (2.4))  imply 

f  01/2  (x)  91/2  (y)  (n  <g>  a)  (dx,  dy)  <  f  6  (x)  a  (x,dy)ir  (dx)  < 

JsxS  JsxS  Q 

Hence  y  is  well  defined  and  [//]1  =  [y\2-  Then  Lemma  3.2  implies  that 


—  log  /  01/2  (x)  91/2  (y)  a  (x,  dy)  n  (dx)  =  R  (y  || 7r  ®  a)  —  /  log  9  (x)  [y]1  (dx) 

JsxS  Js 


(4.36) 


If  R  (y  || 7T  (8>  a )  =  oo  or  —  fs  log  9  (x)  [//]1  (dx)  =  oo,  the  last  display  implies 
[  9 1/2  (x)  9 l/2  (y)  q  (x)  a  (x,  dy)  7 r  (dx)  =  0. 


1  SxS 


By  letting  A  =  0  and  ^  (dx,  du)  =  q  (x)  y  (dx)  5q  (du),  then  £  and  y  satisfy  (4.23)  and 


AR  (y  ||[/x]1  (g)  a)  + 


i  Sxl 


t  (u)  ^  (dx,  du)  =  q  (x)  y  (dx)  =  I  (y) 


21 


25 


Next  assume  R  ( p.  \\it  ®  a)  <  oo  and  —  fs  log#  ( x )  [p]  1  (dx)  <  oo.  Define  A  by 


A  =  exp  — 
Define  the  measure 

and 


R  (p  ||7r  <8)  a)  -  /  log#  (x)  [p]x  (dx)  -  logQ 
•/S' 


p  (dx)  =  q(x)  9  (x)  7 r  (dx) , 


ft  =  d  [p]1  /dp. 

Then  for  any  x  G  S\Q  (recall  0  =  {x  G  S  :  9(x)  =  0}) 

K(x)  =  dMl  (x)  =  (x) 

■  ]  dp  (  ’  Q9  (x)  d5r  } 

In  addition 


/  ft  (x)  log  ft  (x)  p  (dx)  =  /  log  ft  (x)  [/i^  (dx) 
's  -/.S’ 


(4.37) 

(4.38) 

(4.39) 


=  d?([p]i  ||tt)  -  /  log#(x)  [pj^dx) -logQ.  (4.40) 
Js 


Define 


and 


6  (x)  = 


0  for  x  G  0 
An  (x)  for  x  ^  0 


£  (dx,  du )  =  (x)  r]  (dx)  SbM  ( du ) 


(4.41) 


(4.42) 


Then  £  satisfies  the  first  part  of  (4.23).  To  see  that  the  second  part  of  (4.23)  is 
satisfied,  note  that 


[p]x  (0)  =  0  =  /  u £  (dx,  du) 

J&xR+ 


and 


u£  (dx,  du)  =  b  (x)  q  (x)  7]  (dx) 


=  An  (x)  q  (x)  9  (x)  7 r  (dx) 
=  A[p\1(dx). 
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By  using  the  definitions  we  arrive  at  the  following,  each  line  of  which  is  explained 
below: 


AR  (p  ||[//]1  <g>  a] 


'  Sxl 


£  (u)  £  ( dx ,  du ) 


=  AR  (p  ||  Mi®  «)  +  /  £  (b  (x))  q  (x)  rj  (dx) 

Js 

=  AR  (p  ||[ju]1  ®  a)  +  /  q(x)rj(dx)  +  /  £(b(x))p(dx) 

Js\e 

=  AR  (p  ||[ju]1  <S>  a)  +  /  g  (x)  77  (dx)  +  A log A  —  A  +  A  /  k  (x)  log  k  (x)  p  (dx) 

Js  Js 

=  J  q(x)  r]  ( dx )  +  A  log  A  -  A  +  A  ^R  (p\\ir  <g>  a)  —  J  log#  (x)  [^]x  (dx)  -  logQ^ 

=  q  (x)  rj  (dx)  —  A. 

Js 

The  first  equality  uses  (4.42)  and  the  second  uses  (4.41).  The  third  uses  (4.41)  again, 
expands  £,  and  uses  n  =  d[p]1/dp  and  rj  (0)  =  p(0)  =  0.  Equality  four  then  uses 
(4.40)  and  the  fifth  follows  from  (4.37).  Note  that  (4.36)  and  (2.4)  imply 


A  = 


ISxS 


6lJ2  (x)  6 1J2  (y)  q  (x)  a  (x,  dy)  it  (dx) 


(4.43) 


Hence  we  obtain 


AR(p  ||[//]l  #a)  +  /  £(u)£(dx,du)  =  I  ( 77). 


The  representation  formula  (4.4),  the  lower  bound  (4.22)  and  Lemma  4.5  together 


give 


inf - T  log E  [exp  { -TF(rfr)  -  T  ■  00  ■  (l{rT/T}c  (Rt/T))}] 


(4.44) 


>  inf  [F  (y)  +  I  (y)]  . 

r,eV(S) 

4.3  Combining  the  cases 

In  the  last  section,  we  showed  that  (4.44)  is  valid  for  any  sequence  {rx}  such  that 
rx/T  — >  A  £  [0,(7].  An  argument  by  contradiction  shows  that  the  bound  is  uniform 
in  A.  Thus 

.  (  [TC] 

1oS  |  S  E  [^V  {-tf(vt)  ~  T  ■  00  ■  (l{rT/TV  (Rt/T))}] 

^rT=l 

.  (  \TC\ 

1™inf--log^  TC-  \J  E[exp{-TF(riT)-T-oo- (l{rT/Ty(RT/T))}] 

!  rT= 1 


> 


>  inf  [ F  (p)  +  I  (r?)] . 
ver(s) 
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We  now  partition  E  [exp  {— TF{ryr)}\  according  to  the  two  cases  to  obtain  the  overall 
lower  bound 

liminf  —  log  E  [exp  {— TF(j]t)}] 

T — xx)  1 

>  min  {  inf  [. F  ( rj )  +  I  (77)]  ,  [-C  +  C  log  C  +  K2  -  C  log  K2]  \ 
Letting  C  — >  00  we  have  the  desired  Laplace  upper  bound 

liminf  —  —  log E  [exp  {—TF(r)T)}]  >  inf  [F  (77)  +  I  (??)] .  (4.45) 

71— >00  T  r]£V{S) 


5  Proof  of  Laplace  lower  bound 

We  turn  to  the  proof  of  the  reverse  inequality 

limsup  —  —  log-Edexp-f— TF(j]t)}\  <  inf  [F  (77)  +  I  (77)]  .  (5.1) 

T— >00  T  r,£P(S) 

Let  F  be  a  nonnegative  bounded  and  continuous  function.  Fix  an  arbitrary  e  >  0 
and  choose  77  such  that 


F  (77)  +  I  (rj)  <  [F  (v)  +  I  (u)]  +  e.  (5.2) 


As  pointed  out  in  Remark  2.9,  H  defined  in  (3.1)  is  dense  in  V  ( S ).  Since  I  was 
extended  from  H  to  V  (. S )  via  lower  semicontinuous  regularization,  we  can  assume 
without  loss  of  generality  that  77  <C  ^ r.  Define  6  =  dr] /dir.  We  now  argue  we  can 
further  assume  there  exists  5  >  0  such  that 

8  <  9  (x)  <  -  (5.3) 

0 

for  all  x  £  S.  If  r]S  =  (1  —  5)  r)  +  5ir  then  drf  /dir  >  8,  and  the  continuity  of  F  and 
the  convexity  of  I  imply  that  the  difference  between  F  (r/5)  + 1  (t/5)  and  F  (77)  + 1  (77) 
can  be  made  arbitrarily  small. 

Thus  we  can  assume  9  is  uniformly  bounded  from  below  away  from  zero.  Let 
n  G  N,  and  define 


77n  (dx)  =  9  (x)  l{0(x)<n}7T  ( dx )  + 


77  ({a;  :  9  (x)  >  n}) 
it  ({a;  :  9  (x)  >  n}) 


^{9(x)>n}'^  ( dx )  . 


Then  dr\n /dir  <  [77  ({x  :  9  (x)  >  n})  /ir  ({a;  :  9  (x)  >  tz})]  V  n.  and  since  77  <C  7r  implies 
ir  ({x  :  9  (x)  >  n})  — >  0,  rjn  converges  weakly  to  77.  It  then  follows  from  the  continuity 
of  F  and  the  definition  of  I  and  convexity  of  9  — >  — 01/2  that  we  can  choose  77  satisfying 
(5.2)  with  2 e  replacing  e  and  also  (5.3).  Hence  we  assume  77  satisfies  (5.2)  and  (5.3). 
Furthermore  by  Lusin’s  Theorem  [7,  Theorem  7.10],  we  can  also  assume  that  9  is 
continuous. 
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The  proof  of  the  lower  bound  will  use  the  following  representation.  The  infimum 
in  the  representation  is  taken  over  all  control  measures  {<5j,dj},  and  the  properties 
of  such  measures  and  how  fjr  and  Rt  are  constructed  from  them  were  discussed 
immediately  above  the  similar  representation  (4.4).  The  proof  of  the  lemma  is  given 
in  the  appendix. 


Lemma  5.1  Let  F  :  V  ( S )  — >  M  be  bounded  and  continuous.  Then 

Rt 


log  E  [exp  {—T  F  (tit)}]  =  inf  E 


F{rjT)  +  \\a)  +  R{ai  ||cr)) 


i=  1 


where  the  infimum  is  taken  over  all  control  measures  (07,  cr,;} . 

Suppose  that  given  any  measure  i/GP  (S)  satisfying  (5.2)  and  (5.3),  one  can  con¬ 
struct  ai  and  (J%  such  that  given  any  subsequence  of  T,  there  is  a  further  subsequence 
Tn  such  that 


lim  E 

Tn—>  OO 


f(vtJ  + 


1 

Tn 


rt„ 

Y  1 


a)  +  R(<Ji  || cr )) 


=  F(r])  +  L  (rj) . 


Then  Lemma  5.1  implies  the  Laplace  lower  bound  (5.1).  The  construction  of  suitable 
at  and  cq  turns  on  many  of  the  same  constructions  as  those  used  in  the  proof  of  the 
second  part  of  Lemma  4.5.  We  first  define  /a  £  V  (S  x  S)  as  in  (4.35).  Then  auto¬ 
matically  [p]1  =  [/j]2,  and  hence  if  we  define  p  as  the  regular  conditional  probability 
such  that  p,  =  [/x]  1  <8>  p,  then  [p]1  is  invariant  under  p  [4,  Lemma  8.5.1  (a)].  Define 
on  =  p  for  each  i,  and  let  {W}  be  the  corresponding  Markov  chain.  Next  define 
p  ( dx )  =  q  (x)  r]  (dx)  and 


k  (. x ) 


dih]i 

dp 


(x) 


1  d[fi]  1 
Q6 ( x )  dn 


(5.4) 


By  (5.3),  there  is  M  <  00  such  that  1/M  <  k  <  M,  and  due  to  the  continuity  of  9, 
k  is  also  continuous.  Notice  that 


rj  (dx)  =  (q  (x)  k  (x))  1[p]1(dx).  (5-5) 

Assumption  (5.3)  guarantees  that 

—  log  /  01/2  (x)  9 1/2  (y)  (7 r  (g)  a)  (dx,  dy )  <  00  and  —  /  log  9  (x)  [p]1  (dx)  <  00, 

JsxS  Js 

and  (4.36)  then  implies  that  R  (p  ||7r  ®  a)  <  00.  Define  A  as  in  (4.43).  Let  d,;  be  the 
exponential  distribution  with  mean  (An  (Aj_i)]  1  for  each  i.  Thus  we  can  construct 
a  Markov  jump  process  X  ( t )  using  d,;  and  dj  instead  of  a  and  cr,  and  the  infinitesimal 
£  generator  will  be  bounded  and  continuous  and  takes  the  form: 

Cf  (x)  =  Ak  (x)  q  (x)  [  [f  ( y )  -  f  (s)]  p  (x,  dy) . 

Js 
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(5.5)  and  the  fact  that  [/j]1  is  invariant  under  p  imply  fs  (Cf  (x))  p  (dx)  =  0,  and  p 
is  an  invariant  distribution  of  the  continuous  time  process  X.  We  claim  that  p  is  the 
unique  invariant  distribution  of  X.  Indeed,  by  [6,  Proposition  4.9.2]  any  invariant 
distribution  u  for  X  satisfies  fs  ( Cf  (x))  u  (dx)  =  0.  If  we  define 

Ak  (x)  q  (x)  v  (dx) 
fs  Ak  (x)  q  (x)  v  (dx) 1 

then  v  is  invariant  under  p.  However,  by  Condition  2.4  and  [4,  Lemma  8.6.3(c)]  the 
invariant  measure  under  p  is  unique,  and  hence  the  invariant  measure  of  X  is  also 
unique.  By  the  definition  of  pt  in  (4.3), 

1  fT 

V T  (')  =  j,  /  $X(t)  (0  dt 

•J  0 

yj  •  (5-6) 

Since  S  is  compact  we  can  extract  a  subsequence  of  T  such  that  r/T  converges  weakly, 
and  by  [6,  Theorem  4.9.3]  this  weak  limit  is  p.  We  claim  the  following  along  the  same 
subsequence. 

Lemma  5.2  E  — ►  A,  E  jX}i=u  R  ll17)  /T  —■ ►  Js £  (Ak  (x))  q  (x)  rj  (dx)  and 

E  [EfJi  sXi-i  (dx)  /T  A  M  i  (dx) . 

Proof.  As  in  the  proof  of  the  upper  bound,  a  minor  nuisance  is  dealing  with  the 
residual  time  T  —  Eli=u  A-  However,  this  is  more  easily  controlled  here  since  it  is 
bounded  by  an  exponential  with  known  mean.  Since  fjx  —■ ►  p  weakly,  we  have  for 
any  bounded  and  continuous  function  /  on  the  space  of  subprobability  measures  on 
S  that  lim'7’_>0O  E  [f  (fjr)\  =  f  (p)-  To  prove  the  first  part  of  the  lemma,  define  /  by 

/  (v)  =  /  k  (x)  q  (x)  v  (dx) . 

Js 

Since  both  k  and  q  are  bounded  and  continuous,  /  is  also  bounded  and  continuous. 
Using  (5.5) 

f(p)=  f  k  (x)  q  (x)  p  (dx)  =  f  [n\1  (dx)  =  1.  (5.7) 

JS  Js 


(dx)  ~r^  v  +  ^ xn  i 
q(Xi- 1) 


(dx)  \  T-  Y  — -JJ- 

l  h  iX- 
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Recall  that  J~%  is  the  cr-algebra  generated  by  {  (Xq.  . . . ,  Xi)  ,  (n, . . . ,  T~i) }  •  Then 


E 


Rt 


f  tJ2SX-i  (dx) 


=  fE 


=  te 


2=  1 
Rt 


q  {Xi- 1) 


J2K{Xi-i)q{Xi-i) 


Ti 


i= 1 


q  {Xt- 1) 


i—  1 


i=l 


Ee 

2=1 

oo 

Ee 

2=1 

OO 

Ee 


E 


2  —  1 


(*-0  E  ettEt  ^  T 


-t  9  (*;-l) 


Ti-l 


i— 1 


(^-i)  i  E 


fe  9  (*i-i) 


<T  E[Ti\Xi-i\ 


2=1 


=  -^=T 


E 

'  i- 1 


(^-i)  i  E 


ri  <  T 


riq{xj- 1)  “  1) 


AT 


4£ 


OO  /  2—1  _ 

SMSdE-i) 


<  T 


Rt 


This  completes  the  proof  of  the  first  statement  in  the  lemma. 
The  proof  of  the  second  statement  is  similar.  Define  /  by 


f  {y)  =  f  i  (An  (x))  q  (x)  u  (dx) . 
Js 


Then  as  before, 


/  (77)  =  lim  E  [f  (r]T)]  =  lim  E 

T — >00  T — >00 


Rt 


f  tE^wW 


2=1 


q  (Xi- 1) 


28 


32 


Using  g  ( x )  =  xl  (1/a;)  and  Lemma  4.3,  we  have 
Rt 


E 


Ti 


f  I  7F,  V  (dx)  /  -  \ 

1  T  ^  “-1  q  (Xi-i) 

Rt 


=  E 


1 


Ed^U-i)) 


Ti 


i=  1 


2=1 


i—  1 


^(^(^-0)1  E 


fe  9  (*;-0 


<T  |  Ufrd^_il 


=  E 


=  E 


=  E 


Rt 


iv  (  1 

T«  V^Ui-i). 

Rt 


1 


I> 

2=1 


<7*  0\ 


and  the  second  part  of  the  lemma  follows. 

The  proof  of  the  third  part  follows  very  similar  lines  as  the  first  two,  and  is 
omitted.  ■ 

Now  the  Laplace  lower  bound  is  straightforward.  The  definition  of  /j,  in  (4.35), 
the  continuity  of  9,  and  the  bound  (5.3)  imply  x  — >  R(p(x,  •)  ||a  (x,  •))  is  bounded 
and  continuous.  By  Lemma  5.2  and  the  chain  rule  for  relative  entropy, 


lim  E 

T— KX) 


Rt 


F(vt )  +  7p  y  (R  («i-i  II  a)  +  R{(Ji  ||cr)) 


2=1 


=  lim  E[F(rfr)]+  lim  /  R  (p  (x,  •)  ||a  (x,  •) )  E 

T — >oo  T — >oo  lq 


Rt 


■J26Xi-i  ( dx ) 


2=1 


+  lim  E 

T— >oo 


^  Rt 

~yR{ai  || cr ) 


2=1 


=  F  (77)  +  AR(n  || [^]x  (g)  a)  +  /  £  (Aft  (x))  q  (x)  r]  (dx) 


Returning  to  the  proof  of  the  second  part  of  Lemma  4.5,  we  find  that  with  this 
choice  of  A,  ji  and  n,  the  rate  function  I  (77)  coincides  with  AR  (// 1|  [n]-^  ®  a)  + 
j s  i  (An  (x))  q  (x)  q  (dx)  (note  that  this  q  corresponds  to  a  special  of  Lemma  4.5 
where  0  =  {x  6  S'  :  9  (x)  =  0}  is  empty).  This  completes  the  proof  of  the  Laplace 
lower  bound. 
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6  On  the  boundedness  of  rate  function 

As  pointed  out  in  the  Introduction,  continuous  time  jump  Markov  processes  differ 
from  the  type  of  processes  considered  by  Donsker  and  Varadhan  in  [2,  3],  in  that  the 
dynamics  do  not  have  a  “diffusive”  component,  and  hence  Condition  1.1  does  not 
hold.  For  jump  Markov  models,  the  process  only  moves  when  a  jump  occurs,  and 
there  is  no  continuous  change  of  position.  For  these  processes  the  rate  function  is 
bounded,  whereas  for  the  processes  of  [2,  3]  the  rate  function  is  infinity  when  the 
target  measure  is  not  absolutely  continuous  with  respect  to  the  reference  measure. 
We  now  consider  the  source  and  implications  of  this  distinction. 

Consider  a  process  satisfying  all  the  conditions  in  Section  2  that  has  it  as  its 
invariant  distribution.  In  order  to  hit  a  different  probability  measure  7j  6  V  (S),  we 
need  to  perturb  the  original  dynamics,  which  includes  the  distortion  of  the  Markov 
chain  transition  probability  a  and  the  distortion  of  the  exponential  holding  time 
a.  Each  of  these  distortions  must  pay  a  relative  entropy  cost,  and  the  minimum  of 
the  (suitably  normalized)  sum  of  these  costs  asymptotically  approximates  the  rate 
function  I  (rj).  When  7]  is  singular  with  respect  to  7 r,  the  relative  entropy  cost  from 
the  distortion  of  a  can  be  made  arbitrarily  small,  and  the  rate  function  is  almost 
entirely  due  to  contributions  coming  from  the  distortion  of  a.  We  will  illustrate  this 
point  via  the  following  example. 

Recall  the  model  mentioned  in  the  Introduction,  where  the  state  space  S  is  [0, 1], 
the  jump  intensity  is  q  =  1,  and  for  each  x  6  [0, 1],  a  (x,  •)  is  the  uniform  distribution 
on  [0, 1].  The  invariant  distribution  7r  is  just  the  uniform  distribution  on  [0, 1].  Now 
consider  a  Dirac  measure  rj  =  d1(/2  as  a  target  measure,  rj  is  not  absolutely  continuous 
with  respect  to  it.  However,  we  can  approximate  7]  weakly  via  a  sequence  of  prob¬ 
ability  measures  that  are  absolutely  continuous  with  respect  to  n.  For  each  n  6  N 
define  a  probability  measure  rjn  by  its  Radon-Nikodym  derivative  9n  with  respect  to 
7 r  according  to 

nn  ,  X  =  /  n  -  1  for  x  G  (I  -  2^,  I  +  2f) 

^  '  \  otherwise 

Using  the  formula  (3.2)  for  rate  function,  we  have 

I  (r f)  =  1  -  QT1  r  (x))1/2  ds)  QT1  (9n  (y))1'2  dy )  =  1  - 

According  to  the  definition  of  rate  function  in  Section  3.1,  the  rate  function  is  bounded 
above  by  1.  However  I  ( rjn )  — >  1  as  n  — *  oo,  and  one  can  check  that  this  is  true  for  any 
sequence  of  absolutely  continuous  measures  converging  weakly  to  ij.  Thus  I  (rj)  =  1. 

We  now  consider  fixed  n  £  N  and  examine  the  perturbed  dynamics  that  can  hit  the 
measure  rjn.  This  is  most  easily  understood  by  examining  the  minimizer  in  the  varia¬ 
tional  formula  for  the  rate  function,  whose  form  was  suggested  during  the  proof  of  the 
Laplace  principle  lower  bound  in  Section  5.  Recall  that  a%  (•)  and  d*  (•)  are  perturbed 
dynamics  for  the  exponential  holding  time  and  the  Markov  chain,  dj  (•)  depends  on 
{A0,fi,  Ai,T2,  . .  .,Xi- 1}  and  oa  (•)  depends  on  {X0,fi,X1,f2  . . . ,  n  and 
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X{  are  chosen  recursively  according  to  stochastic  kernels  dj  (•)  and  at  (•).  Specifically, 
Si  is  defined  by  (1.10)  using  X{  and  fp  Rt  is  defined  by  (4.2)  using  sp  and  f/r  is 
defined  by  (4.3)  using  X{.  f%  and  Rt-  Following  the  procedure  in  Section  5,  we  first 
define  p  G  V  (S  x  S)  as  in  (4.35).  Thus  p  is  the  product  measure.  As  before,  we 
use  [/_/,]  x  to  denote  the  first  marginal  of  p  and  p  to  denote  the  regular  conditional 
probability  such  that  p  =  [p]1  ®  p.  Since  p  is  a  product  measure  defined  by  (4.35), 
[/.i]1  and  p  are  in  fact  the  same  measure  and  the  density  with  respect  to  it  can  be 
calculated  as 


(r]  _  \  I  for  *  G  (I  -  2^5  +  2f) 

diT{  \  2pr=T)  otherwise 


(6.1) 


As  in  Section  5  let  ati  =  p  for  each  i.  A  direct  calculation  of  A  using  formula  (4.43) 
shows  that  A  =  4  [n  —  1)  /n2.  Also,  re  defined  in  (5.4)  reduces  to 

K(x\=f  2(n— 1)  for  X  G  (g  -  2n,  2  +  2n) 

1  §  otherwise 

As  in  Section  5,  ch  should  be  the  exponential  distribution  with  mean  [Are  (A*_i)]  1. 
Hence  if  Aj_i  falls  into  (1/2  —  l/(2n),  1/2  +  l/(2n)),  a,;  would  be  the  exponential 
distribution  with  mean  n/2,  otherwise  u*  would  be  the  exponential  distribution  with 
mean  nj  [2  (n  —  1)].  Now  the  perturbed  Markov  jump  process,  denoted  by  X  (f),  is 
constructed  using  <5*  and  ft,-  defined  as  above.  As  proved  in  Lemma  5.2,  the  expected 
value  of  the  relative  entropy  cost 


l 

-^2(R(ai~i  || a)  +  R{vi  ||cr)) 

i=  1 


convergences  to 

I  (pn)  =  AR(p\\[p\1®  a)  +  [  £(Ak  (x))  pn  (dx) 

Jo 

as  T  — >  oo.  We  have  noted  that  p  (x,  dy )  =  [p]1  ( dy )  and  a  (. x ,  dy)  =  n  ( dy ),  and  by 
using  (6.1) 


AR  {pWiph  <S>  a) 

=  A  R(p(x,-)\\a(x,-))[p]l(dx) 

Jo 

4(n  — 1)  /  ,  log(n  — 1)\ 

=  — (log"  -  1°S2 - 5 - )  • 

This  converges  to  0  as  n  — >  oo.  Hence  the  relative  entropy  cost  that  comes  from  the 
distortion  of  the  Markov  chain  converges  to  0.  For  the  second  term,  we  have 

[  £  (Are  (x))  rjn  (dx)  =  ^  ^  9  ^  (log  (n  —  1)  +  2  log  2  —  log  n)  —  ^  ^  ^  +  1 

J  o  n  nz 
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which  converges  to  1  as  n  — ►  oo.  Thus  as  rf1  approaches  the  target  distribution  77,  the 
relative  entropy  cost  that  comes  from  the  distortion  of  Markov  chain  vanishes,  and 
the  rate  function  becomes  solely  determined  by  the  relative  entropy  cost  that  comes 
from  the  distortion  of  exponential  waiting  times. 

One  can  generalize  the  argument  to  more  general  discrete  target  measures,  where 
one  utilizes  the  original  dynamics  to  make  sure  neighborhoods  of  the  various  points 
are  visited,  and  then  uses  the  time  dilation  to  control  their  relative  weight. 


7  Appendix 

7.1  Proof  of  inequality  (4.30) 

Proof.  Recall  that  R  (ji  ||Mi  <8>  a)  <  00,  where  Mi  =  M2  and  7 f  is  invariant  under 
a.  Additionally,  we  also  have  Condition  2.6,  i.e. ,  there  exists  an  integer  N  and  a  real 
number  c  £  (0,  00)  such  that 

(x,  •)  <  OT  (•)  (7-1) 

for  all  x  £  S.  Now  let  p  be  the  regular  conditional  probability  such  that  p  =  Mi  ®p. 
Then 

R  (/i  ||  Mi®  a)  =  R([l4i  ®  P  II  Mi  ®  a)  <00. 

The  chain  rule  of  relative  entropy  implies  that 


R  (  Mi  •  •  •  ®p 

\  N 


M  1  ®  a  ®  OL 


N 


N  ■  R  (Mi  IIMi  ®  a)  <  °°-  (7.2) 


Indeed,  since  Mi  is  invariant  under  p,  for  any  integer  n  the  n-th  marginal  of 


Mi  <S>  P  <8>  •  •  •  <S) 


n—  1 


IS 


Ml  <8>p<s>  •  •  •  ®P 

n— 1 


=  Mi  • 


Hence  (7.2)  follows  by  induction: 


R  ( Mi  ®p<s>  •  •  • 
V  n 


M 1  ®  a  <S>  •  •  •  <8>  a 


=  R  (  Ml  <8>p<8>  •  •  •  ®P 


n—  1 


Mi  <8>  a  <8>  ■  ■  ■  <S>  a  )  +  /  R(p\\a)d 
n- 1  /  Js 


Ml  •  •  •  <8>p 


n— 1 


=  (n  -  1)  •  R(Mi  # P  II  Mi  ®  «)  +  [  R(p||a)dMi 

•/S 

=  n  ■  R  (Mi  ®  P  II  Ml  ®  a)  ■ 


Let  [Hfcij  denote  the  conditional  probability  of  the  fc-th  argument  of  1/  given  the  j-th 
argument  of  u.  Note  that  one  can  define  a  mapping  from  V  (S'JV+1)  to  V  (S'2)  such 
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that  each  v  e  V  (SN+1)  is  mapped  to  [z/]1  <S>  Mjv+i|i-  Since  the  relative  entropy  for 
induced  measures  is  always  smaller,  (7.2)  implies 

R  <g>  Mi  <S>  <  oo. 

Now  since  Mi  is  invariant  under  p,  it  is  also  invariant  under  p^N\  and  therefore 
[Mi  ®p(N^]2  =  [/r] ! •  Using  the  chain  rule  of  relative  entropy  again  gives 

r([p]  1  Mi  <3>  oc^  )  <  oo. 

This  implies  (4.30),  since 

oo  >  R  ^[/i] !  Mi®a^  ^ 

f  d([pl  <g la^), 

=  «(Mi  P)  -  log  js  ^  Mr 

>  R( Mr  ||tt)  -  logc, 
where  c  is  from  (7.1).  ■ 


7.2  Proof  of  Lemma  5.1 

The  proof  of  the  representation  is  standard,  save  for  the  fact  that  Rt  is  random.  We 
include  a  proof  here  for  completeness. 

Proof.  Define  for  each  k  €  N+ 

Rt  Ak—1  /  Rt^\^—  1 

For  any  measure  uk  £  V  ((S  x  M+)fc') ,  we  can  decompose  uk  as 


vk  =  ato  ®  <Ji  ®  a\  ®  a?  ®  ®  Oik- 1  <8>  cr*.  (7.3) 

Choose  the  barred  random  variables  Xt  and  ft  according  to  on  and  ot  as  before  and 
define  the  corresponding  Rt  A  k  the  following  way:  if  Yli=i  ^i/q  (W-i)  >  T ,  then 
Rt  A  k  =  Rt  where  Rt  is  the  integer  that  satisfies 


T*  <  t  <  Ti 

k  1  PA)  -  4  (V,-l)  • 


otherwise  define  i?T  A  k  =  k.  We  also  define 


^  .Rj’Afc— 1  _ 

-  y  k,(.)  y  .+%. 

T  ^  -*-1  V  '  q  (Xi- 1)  XfiTAt-i 


Rt/\^— 1 

r-  E 
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If  we  denote  the  multi-dimensional  probability  measure  corresponding  to  the  original 
dynamics  by  fj,k  £  V  ^(5  x  i.e. , 


nk  =  x  yi 


a 


then  applying  Lemma  3.2  gives 


-^log  E 


exp 


{-TF(^)} 


inf 

yfee:P((SxR+)fc) 


'  (SxR+y 


F  duk  +  (vk 


By  applying  Theorem  3.3  repeatedly  to  R  ( vk  /k’)  we  obtain 


Y  {R{pti- 1  \\a)  +  R  (<7j  || cr )) 

7=1 


(7.5) 


k 

/)  =F 

We  can  thus  rewrite  (7.5)  as 


-y  log  E 


exp 


inf  E 

ukeV((3xR+)k) 


F  (r]j^j  +  ^  Y  (R(ai- 1  Ik)  +  R(<ri  Ik)) 


i— 1 


(7.6) 

Now  for  each  ^(5  x  M+)fc^ ,  we  construct  another  measure  Dk  €  V  ^( S  x  M+)fe^ 

recursively  as  follows:  define  So  =  do  and  Si  =  o\.  For  all  2  <  i  <  k,  define  Sj_i  and 
S*  by 

'  (“*-!’ ^)  if^=l  ^)-T  . 

(a,  a)  otherwise 


{oii—l,  (7i)  — 


Thus  we  return  to  the  original  dynamics  with  zero  relative  entropy  cost  after  R y. 
Define  using  S*  and  S*  by  (7.3).  From  the  definition  (7.4)  we  have  F  [F  (rfe)]  = 
E  [F  {rjj)\ ,  and 


E 


y  (-R(Si_i  ||  ck  )  +  f  (Si  ik)) 


,i=  1 


=  E 


=  E 


<  E 


RjpAk 

Y,  (F(Si_i  ||a)  +  R{di  ||cr)) 

i=  1 
R-TAk 

y  {R (dj_i  |k)  +  r (<7j  ik )) 

7=1 

/c 

(F(di_i  Ik)  +  F(di  Ik)) 

.7—1 


Hence  we  can  rewrite  (7.6)  as 


log E 


exp 


{-rFk)j 


inf  E 

vk£V((Sx  R+)k) 


RTAk 


F  (r/^)  +  -  (i2(at_i  ||a)  +  R(<Ji  \\a 

(7.7) 


7=1 
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Using  the  pointwise  convergence  of  both  Rt  A  k  — >  Rt  and  Rt  A  k  — >  Rt  as  k  — >  oo, 
by  the  dominated  convergence  theorem 


lh^o  ~'flogE  cxp  {  ~TF  (7?t)  } 


=  —  ^  log  E  [exp  {—T F  (rjT)}]  , 


lim  E 

k — >oo 


=^[F(i7T)] 

Also,  by  the  monotone  convergence  theorem 


RT/\k 

Rt 

T,  ((i?(dj_i  ||a)  +  R(ai  ||cr ))) 

=  E 

y  {{R{ati- 1  a)  +  R(&i  ||cr ))) 

i=  1 

i=  1 

lim  E 

k — >oo 


Hence  by  taking  limits  on  both  sides  of  (7.7),  we  arrive  at 

Rt 


—  ^  log  E  [exp  {—TF  (riT)}]  =  inf  E 


F(vt)  +  ^^((R(ai- 1  || a)  +  R(ui  ||cr))) 


i=  1 


where  the  infimum  is  taken  over  all  controlled  measures  {a^er*}.  This  proves  the 
lemma.  ■ 
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